Exploratory Data Analysis, Vancouver street trees

Fatemeh Salim

Here, are we are going to work with Vancouver Street trees data set. I chose to work with a smaller data set that contains only 5,000 rows. Let's import the data and look at first few rows and then I am going to start exploratory data analysis for this data set.

Questions of interest

1. Which neighbourhoods in Vancouver has the most number of trees? 2. Are height range and diameter of trees related? 3. Which neighbourhoods have more flowering cherry trees? 4. Neighbourhoods with tallest cherry trees? 5. Distribution of diameter of flowering cherry trees for different heights? 6. Distribution of cherry trees' diameter? 7. What are top 20 popular trees in Vancouver? 8. Visualize the distributions of all numerical columns for popular trees in Vancouver. 9. What is the most frequent combination of height and diameter among popular trees in Vancouver? 10. Visualize the count of all categorical aspects of popular trees in Vancouver. 11. Explore the relationship between categorical and numerical columns in popular tree data frame.

Description & Review of Data

To answer these questions, I will need only the following columns. I kept date_planted in the dataframe for now. However I won't use it since more than half of the dates are missing.

Exploratory visualizations

Let's first take a look at which columns are categorical and which ones are numerical.

Now, we can start answering the questions we pose at the beginning of this notebook.

Question 1: Which neighbourhoods in Vancouver has the most number of trees?

We can tell from the above bar chart that Kensington-Cedar Cottage, Renfrew-Collingwood, and Hastings-Sunrise are the top three neighbourhood in terms of number of tree planted.

Question 2: Are height range and diameter of trees related?

I figured that using the mean of diameter for answering this question can hide information about how the diameter range is scattered for each height range. So I decided to consider both scatter plot with all the diameter point and a line plot with the mean of the diameter. I can see that there is one outlier point. I am going to remove that and repeat the chart to get a better understanding.

From this plot, we can tell taller trees, by average has bigger diameter. However, I can tell from the scatter plot that there is good number of trees that are tall with smaller diameter.

Calculating the correlation, shows a positive relationship between this two columns.

Now let's explore flowering cherry trees. These trees are beautiful in spring. Photographers and tourists can use these locations. here I am going to answer question 3.

Question 3: Which neighbourhoods have more flowering cherry trees?

Question 4: Neighbourhoods with tallest cherry trees?

There are 5 specific neighbourhoods that have few trees in the 4-height range, including Mount Pleasant, Dunbar-Southlands, Kerrisdale, Fairview, and West point Grey. However, each of these neighbourhood has less than 5 tall trees. We can see Victoria-Fraserview neighbourhood has 19 tall cherry trees in 3-high range.

Question 5: Distribution of diameter of flowering cherry trees for different heights?

From the plot above, we can tell cherry trees with diameter bigger than 25, are among taller trees.

We can tell that the most common diameter for different height range is different among cherry trees. for example, the most common diameter for shorter cherry trees is 5, whereas tallest cherry trees' most common diameter is about 32 inches.

However, I can tell from this density plot that for trees in height range 4, there is not enough example to be able to draw accurate conclusion, since the density plot seems to be cut at both ends.

Question 6: Distribution of cherry trees' diameter?

We can tell from the above plot that Killarney has the thicker trees both in terms of median of the diameter and number of thicker trees. From the bar chart or the mouse hovering over the box plot, the max diameter for this neighbourhood is 34. Bar chart will show the Victoria-Fraserview has trees that their diameter reaches 46. However, for the box plot we can tell the median of tree diameter in this neighbourhood is lower than Killarney. What caused this neighbourhood to show a taller bar in bar chart is few trees that went above the 30 inches in diameter.

Let's explore this new data frame that I made.

Let's first find the categorical and numerical columns in common trees dataframe.

Now we can use this information to answer question 8 and visualize the distributions of all numerical columns in common trees dataframe. this will sure help us understand the data better.

That the diameter of the trees plot has at least two peaks. most of the trees has a diameter between 2 to 4 inches and are of height range 2 and 3.

From the heat map above, the most frequent combination of height and diamtere among popular trees in vancouver is diamter between 2 and 4 and height range id 1.

I am hoping to get a better understanding of most frequent specie, tree name, and genus of all popular trees by answering this question.

From these repeated plots, we can tell Ceratifera is the most common specie, Flowering cherry tree is the most common tree and Prinus is the most common genus. Renfrew-collingwood has the most of popular trees in Vancouver.

Answering this question, wiil help to have a better understanding the height and diameter changes for different specie and genues of trees as well as different neighbourhood.

This exploration of categorical and numerical columns leads to very interesting results. Among the species Platinoids has the largest diameter median and height median. The median of trees thickness in Marpole neighbourgood, is the largest.

Concluding remarks

This section explains which five plots I am going to include in my report and how they will be changed for the audience.

1: The plot for question 1, I can add more explanatory title and subtitle. removing the x axis and instead showing the counts of each neighbourhood tree beside it’s related bar.

2: Second plot from question 2, better axis labels. tool tip can be added for the line chart to show that the line marks the mean of diameter. adding a explanatory title.

3: Plot from question 4,adding title for the plot.

4: Plots from question 6, axis title and plot title needs work.

5: Plot from question 9, y axis ticks can be changed to be integer. Axis title and plot tile needs work. I think for public audience I change this plot to a square plot that size of squares and their colors reflect the count of observation. That probably is easier to understand.